Goto

Collaborating Authors

 position error



Configuration-Dependent Robot Kinematics Model and Calibration

Lu, Chen-Lung, He, Honglu, Julius, Agung, Wen, John T.

arXiv.org Artificial Intelligence

Abstract-- Accurate robot kinematics is essential for precise tool placement in articulated robots, but non-geometric factors can introduce configuration-dependent model discrepancies. This paper presents a configuration-dependent kinematic calibration framework for improving accuracy across the entire workspace. Local Product-of-Exponential (POE) models, selected for their parameterization continuity, are identified at multiple configurations and interpolated into a global model. Inspired by joint gravity load expressions, we employ Fourier basis function interpolation parameterized by the shoulder and elbow joint angles, achieving accuracy comparable to neural network and autoencoder methods but with substantially higher training efficiency. V alidation on two 6-DoF industrial robots shows that the proposed approach reduces the maximum positioning error by over 50%, meeting the sub-millimeter accuracy required for cold spray manufacturing. Robots with larger configuration-dependent discrepancies benefit even more. A dual-robot collaborative task demonstrates the framework's practical applicability and repeatability.


Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation

Wang, Maggie, Tian, Stephen, Swann, Aiden, Shorinwa, Ola, Wu, Jiajun, Schwager, Mac

arXiv.org Artificial Intelligence

Phys2Real is a real-to-sim-to-real pipeline for robotic manipulation that combines VLM-based physical parameter estimation with interaction-based adaptation through uncertainty-aware fusion. It comprises three stages: (I) real-to-sim: object reconstruction from segmented Gaussian Splats into simulation-ready meshes, (II) policy learning: reinforcement learning of policies conditioned on physical parameters such as the center of mass (CoM) of an object, and (III) sim-to-real transfer: uncertainty-aware fusion of VLM priors and interaction-based estimates for online adaptation. Abstract-- Learning robotic manipulation policies directly in the real world can be expensive and time-consuming. While reinforcement learning (RL) policies trained in simulation present a scalable alternative, effective sim-to-real transfer remains challenging, particularly for tasks that require precise dynamics. T o address this, we propose Phys2Real, a real-to-sim-to-real RL pipeline that combines vision-language model (VLM)-inferred physical parameter estimates with interactive adaptation through uncertainty-aware fusion. Our approach consists of three core components: (1) high-fidelity geometric reconstruction with 3D Gaussian splatting, (2) VLM-inferred prior distributions over physical parameters, and (3) online physical parameter estimation from interaction data. On planar pushing tasks of a T - block with varying center of mass (CoM) and a hammer with an off-center mass distribution, Phys2Real achieves substantial improvements over a domain randomization baseline: 100% vs 79% success rate for the bottom-weighted T -block, 57% vs 23% in the challenging top-weighted T -block, and 15% faster average task completion for hammer pushing. Ablation studies indicate that the combination of VLM and interaction information is essential for success. Deploying robotic manipulation policies trained in simulation to the real world remains a fundamental challenge, especially for tasks requiring fine-grained physical dynamics. Robots must adapt to varying object properties such as friction, mass distribution, and compliance, which significantly affect manipulation outcomes but are difficult to model precisely. While learning from demonstrations has shown significant promise, it often lacks the physical grounding and reasoning needed to adapt to novel objects.



YawSitter: Modeling and Controlling a Tail-Sitter UAV with Enhanced Yaw Control

Habel, Amir, Mehboob, Fawad, Sam, Jeffrin, Fortin, Clement, Tsetserukou, Dzmitry

arXiv.org Artificial Intelligence

Achieving precise lateral motion modeling and decoupled control in hover remains a significant challenge for tail-sitter Unmanned Aerial Vehicles (UAVs), primarily due to complex aerodynamic couplings and the absence of welldefined lateral dynamics. This paper presents a novel modeling and control strategy that enhances yaw authority and lateral motion by introducing a sideslip force model derived from differential propeller slipstream effects acting on the fuselage under differential thrust. The resulting lateral force along the body y-axis enables yaw-based lateral position control without inducing roll coupling. The control framework employs a YXZ Euler rotation formulation to accurately represent attitude and incorporate gravitational components while directly controlling yaw in the yaxis, thereby improving lateral dynamic behavior and avoiding singularities. The proposed approach is validated through trajectory-tracking simulations conducted in a Unity-based environment. Tests on both rectangular and circular paths in hover mode demonstrate stable performance, with low mean absolute position errors and yaw deviations constrained within 5.688 degrees. These results confirm the effectiveness of the proposed lateral force generation model and provide a foundation for the development of agile, hover-capable tail-sitter UAVs.


Rich State Observations Empower Reinforcement Learning to Surpass PID: A Drone Ball Balancing Study

Liu, Mingjiang, Huang, Hailong

arXiv.org Artificial Intelligence

Abstract-- This paper addresses a drone ball-balancing task, in which a drone stabilizes a ball atop a movable beam through cable-based interaction. We propose a hierarchical control framework that decouples high-level balancing policy from low-level drone control, and train a reinforcement learning (RL) policy to handle the high-level decision-making. Simulation results show that the RL policy achieves superior performance compared to carefully tuned PID controllers within the same hierarchical structure. Through systematic comparative analysis, we demonstrate that RL's advantage stems not from improved parameter tuning or inherent nonlinear mapping capabilities, but from its ability to effectively utilize richer state observations. These findings underscore the critical role of comprehensive state representation in learning-based systems and suggest that enhanced sensing could be instrumental in improving controller performance.


Human Motion Capture from Loose and Sparse Inertial Sensors with Garment-aware Diffusion Models

Ilic, Andela, Jiang, Jiaxi, Streli, Paul, Liu, Xintong, Holz, Christian

arXiv.org Artificial Intelligence

Motion capture using sparse inertial sensors has shown great promise due to its portability and lack of occlusion issues compared to camera-based tracking. Existing approaches typically assume that IMU sensors are tightly attached to the human body. However, this assumption often does not hold in real-world scenarios. In this paper, we present Garment Inertial Poser (GaIP), a method for estimating full-body poses from sparse and loosely attached IMU sensors. We first simulate IMU recordings using an existing garment-aware human motion dataset. Our transformer-based diffusion models synthesize loose IMU data and estimate human poses from this challenging loose IMU data. We also demonstrate that incorporating garment-related parameters during training on loose IMU data effectively maintains expressiveness and enhances the ability to capture variations introduced by looser or tighter garments. Our experiments show that our diffusion methods trained on simulated and synthetic data outperform state-of-the-art inertial full-body pose estimators, both quantitatively and qualitatively, opening up a promising direction for future research on motion capture from such realistic sensor placements.


Graph-based Robot Localization Using a Graph Neural Network with a Floor Camera and a Feature Rich Industrial Floor

Brämer, Dominik, Kleingarn, Diana, Urbann, Oliver

arXiv.org Artificial Intelligence

Accurate localization represents a fundamental challenge in robotic navigation. Traditional methodologies, such as Lidar or QR-code-based systems, suffer from inherent scalability and adaptability constraints, particularly in complex environments. In this work, we propose an innovative localization framework that harnesses flooring characteristics by employing graph-based representations and Graph Convolutional Networks (GCNs). Our method uses graphs to represent floor features, which helps localize the robot more accurately ( 0. 64 cm error) and more efficiently than comparing individual image features. Additionally, this approach successfully addresses the kidnapped robot problem in every frame without requiring complex filtering processes.


Globally Optimal Data-Association-Free Landmark-Based Localization Using Semidefinite Relaxations

Korotkine, Vassili, Cohen, Mitchell, Forbes, James Richard

arXiv.org Artificial Intelligence

--This paper proposes a semidefinite relaxation for landmark-based localization with unknown data associations in planar environments. The proposed method simultaneously solves for the optimal robot states and data associations in a globally optimal fashion. Relative position measurements to a fixed set of known landmarks are used, but the data association is unknown in that the robot does not know which landmark each measurement is generated from. The relaxation is shown to be tight in a majority of cases for moderate noise levels. The proposed algorithm is compared to local Gauss-Newton baselines initialized at the dead-reckoned trajectory, and is shown to significantly improve convergence to the problem's global optimum in simulation and experiment. STIMA TING the state of a robot from noisy and incomplete sensor data is a central task associated with autonomy. In the landmark-based localization task, the robot infers its position and orientation from measurements from landmarks with known positions. State estimation methods for localization can be split into filtering methods and batch optimization methods [1].


Graph Neural Network Surrogates for Contacting Deformable Bodies with Necessary and Sufficient Contact Detection

Dubey, Vijay K., Haese, Collin E., Gültekin, Osman, Dalton, David, Rausch, Manuel K., Fuhg, Jan N.

arXiv.org Artificial Intelligence

Surrogate models for the rapid inference of nonlinear boundary value problems in mechanics are helpful in a broad range of engineering applications. However, effective surrogate modeling of applications involving the contact of deformable bodies, especially in the context of varying geometries, is still an open issue. In particular, existing methods are confined to rigid body contact or, at best, contact between rigid and soft objects with well-defined contact planes. Furthermore, they employ contact or collision detection filters that serve as a rapid test but use only the necessary and not sufficient conditions for detection. In this work, we present a graph neural network architecture that utilizes continuous collision detection and, for the first time, incorporates sufficient conditions designed for contact between soft deformable bodies. We test its performance on two benchmarks, including a problem in soft tissue mechanics of predicting the closed state of a bioprosthetic aortic valve. We find a regularizing effect on adding additional contact terms to the loss function, leading to better generalization of the network. These benefits hold for simple contact at similar planes and element normal angles, and complex contact at differing planes and element normal angles. We also demonstrate that the framework can handle varying reference geometries. However, such benefits come with high computational costs during training, resulting in a trade-off that may not always be favorable. We quantify the training cost and the resulting inference speedups on various hardware architectures. Importantly, our graph neural network implementation results in up to a thousand-fold speedup for our benchmark problems at inference.